Failure-Resilient Computations in the EcliPSe System

نویسندگان

  • Felipe Knop
  • Vernon Rego
  • Vaidy S. Sunderam
  • Adam Ferrari
چکیده

Local or wide-area connected workstation cluster-based computation systems are inherently failure-prone, particularly for long running computations. In this work we introduce a variety of features for failure resilience in the EcliPSe system for replicative applications. Key characteristics of fault-tolerant EcliPSe are ease of use, low statesaving costs, system scalability and good performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fail-safe concurrency in the EcliPSe system

Local or wide-area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long-running distributed computations. We found this to be the case in early experiments with a prototype of the EcliPSe system, a software toolkit for replicative applications on heterogeneous workstation clusters. Hardware or network failures i...

متن کامل

Verification of Monitor unit calculations for eclipse Treatment Planning System by in- house developed spreadsheet

Introduction: Computerized treatment planning is a rapidly evolving modality that depends on hardware and software efficiency. Despite ICRU recommendations suggesting 5% deviation in dose delivery the overall uncertainty shall be less than 3.5% as suggested by B.J. Minjnheer. J. In house spreadsheets are developed by the medical physicists to cross-verify the dose calculated by the Treatment Pl...

متن کامل

On the Effectiveness of Superconcurrent Computations on Heterogeneous Networks

Concurrent computing on networked collections of computer systems is rapidly evolving into a viable technology that is attractive from the economic, performance, and availability perspectives. Several software infrastructures that support such heterogeneous network-based concurrent computing have evolved, and are in use for production-quality high-performance computing. In this paper, we descri...

متن کامل

Rexsss Performance Analysis: Domain Decomposition Algorithm Implementations for Resilient Numerical Partial Differential Equation Solvers

The future of extreme-scale computing is expected to magnify the influence of soft faults as a source of inaccuracy or failure in solutions obtained from distributed parallel computations. The development of resilient computational tools represents an essential recourse for understanding the best methods for absorbing the impacts of soft faults without sacrificing solution accuracy. The Rexsss ...

متن کامل

Resilient Configuration of Distribution System versus False Data Injection Attacks Against State Estimation

State estimation is used in power systems to estimate grid variables based on meter measurements. Unfortunately, power grids are vulnerable to cyber-attacks. Reducing cyber-attacks against state estimation is necessary to ensure power system safe and reliable operation. False data injection (FDI) is a type of cyber-attack that tampers with measurements. This paper proposes network reconfigurati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994